Blekinge County
Reflections on the Reproducibility of Commercial LLM Performance in Empirical Software Engineering Studies
Angermeir, Florian, Amougou, Maximilian, Kreitz, Mark, Bauer, Andreas, Linhuber, Matthias, Fucci, Davide, C., Fabiola Moyón, Mendez, Daniel, Gorschek, Tony
Large Language Models have gained remarkable interest in industry and academia. The increasing interest in LLMs in academia is also reflected in the number of publications on this topic over the last years. For instance, alone 78 of the around 425 publications at ICSE 2024 performed experiments with LLMs. Conducting empirical studies with LLMs remains challenging and raises questions on how to achieve reproducible results, for both researchers and practitioners. One important step towards excelling in empirical research on LLM and their application is to first understand to what extent current research results are eventually reproducible and what factors may impede reproducibility. This investigation is within the scope of our work. We contribute an analysis of the reproducibility of LLM-centric studies, provide insights into the factors impeding reproducibility, and discuss suggestions on how to improve the current state. In particular, we studied the 85 articles describing LLM-centric studies, published at ICSE 2024 and ASE 2024. Of the 85 articles, 18 provided research artefacts and used OpenAI models. We attempted to replicate those 18 studies. Of the 18 studies, only five were sufficiently complete and executable. For none of the five studies, we were able to fully reproduce the results. Two studies seemed to be partially reproducible, and three studies did not seem to be reproducible. Our results highlight not only the need for stricter research artefact evaluations but also for more robust study designs to ensure the reproducible value of future publications.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > Portugal > Lisbon > Lisbon (0.05)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
SEER: Sustainability Enhanced Engineering of Software Requirements
Roy, Mandira, Deb, Novarun, Chaki, Nabendu, Cortesi, Agostino
The rapid expansion of software development has significant environmental, technical, social, and economic impacts. Achieving the United Nations Sustainable Development Goals by 2030 compels developers to adopt sustainable practices. Existing methods mostly offer high-level guidelines, which are time-consuming to implement and rely on team adaptability. Moreover, they focus on design or implementation, while sustainability assessment should start at the requirements engineering phase. In this paper, we introduce SEER, a framework which addresses sustainability concerns in the early software development phase. The framework operates in three stages: (i) it identifies sustainability requirements (SRs) relevant to a specific software product from a general taxonomy; (ii) it evaluates how sustainable system requirements are based on the identified SRs; and (iii) it optimizes system requirements that fail to satisfy any SR. The framework is implemented using the reasoning capabilities of large language models and the agentic RAG (Retrieval Augmented Generation) approach. SEER has been experimented on four software projects from different domains. Results generated using Gemini 2.5 reasoning model demonstrate the effectiveness of the proposed approach in accurately identifying a broad range of sustainability concerns across diverse domains.
- Asia > India > West Bengal > Kolkata (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
- (4 more...)
- Overview (1.00)
- Research Report > New Finding (0.46)
- Health & Medicine (1.00)
- Education (0.93)
- Banking & Finance > Economy (0.66)
- (2 more...)
Documenting Deployment with Fabric: A Repository of Real-World AI Governance
Jorgensen, Mackenzie, Brogle, Kendall, Collins, Katherine M., Ibrahim, Lujain, Shah, Arina, Ivanovic, Petra, Broestl, Noah, Piles, Gabriel, Dongha, Paul, Abdulhussein, Hatim, Weller, Adrian, Powers, Jillian, Bhatt, Umang
Artificial intelligence (AI) is increasingly integrated into society, from financial services and traffic management to creative writing. Academic literature on the deployment of AI has mostly focused on the risks and harms that result from the use of AI. We introduce Fabric, a publicly available repository of deployed AI use cases to outline their governance mechanisms. Through semi-structured interviews with practitioners, we collect an initial set of 20 AI use cases. In addition, we co-design diagrams of the AI workflow with the practitioners. We discuss the oversight mechanisms and guardrails used in practice to safeguard AI use. The Fabric repository includes visual diagrams of AI use cases and descriptions of the deployed systems. Using the repository, we surface gaps in governance and find common patterns in human oversight of deployed AI systems. We intend for Fabric to serve as an extendable, evolving tool for researchers to study the effectiveness of AI governance.
- North America > United States (1.00)
- Asia > Singapore (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)
- Questionnaire & Opinion Survey (0.88)
- Research Report (0.82)
- Workflow (0.69)
- Personal > Interview (0.66)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
- (3 more...)
CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning
Hiniduma, Kaveen, Li, Zilinghan, Sinha, Aditya, Madduri, Ravi, Byna, Suren
Privacy-Preserving Federated Learning (PPFL) is a decentralized machine learning approach where multiple clients train a model collaboratively. PPFL preserves the privacy and security of a client's data without exchanging it. However, ensuring that data at each client is of high quality and ready for federated learning (FL) is a challenge due to restricted data access. In this paper, we introduce CADRE (Customizable Assurance of Data Readiness) for federated learning (FL), a novel framework that allows users to define custom data readiness (DR) metrics, rules, and remedies tailored to specific FL tasks. CADRE generates comprehensive DR reports based on the user-defined metrics, rules, and remedies to ensure datasets are prepared for FL while preserving privacy. We demonstrate a practical application of CADRE by integrating it into an existing PPFL framework. We conducted experiments across six datasets and addressed seven different DR issues. The results illustrate the versatility and effectiveness of CADRE in ensuring DR across various dimensions, including data quality, privacy, and fairness. This approach enhances the performance and reliability of FL models as well as utilizes valuable resources.
- North America > United States > Ohio (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Sweden > Blekinge County > Karlskrona (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government (0.93)
- Health & Medicine > Diagnostic Medicine > Imaging (0.47)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Automatic Identification of Machine Learning-Specific Code Smells
Hamfelt, Peter, Britto, Ricardo, Rocha, Lincoln, Almendra, Camilo
Machine learning (ML) has rapidly grown in popularity, becoming vital to many industries. Currently, the research on code smells in ML applications lacks tools and studies that address the identification and validity of ML-specific code smells. This work investigates suitable methods and tools to design and develop a static code analysis tool (MLpylint) based on code smell criteria. This research employed the Design Science Methodology. In the problem identification phase, a literature review was conducted to identify ML-specific code smells. In solution design, a secondary literature review and consultations with experts were performed to select methods and tools for implementing the tool. We evaluated the tool on data from 160 open-source ML applications sourced from GitHub. We also conducted a static validation through an expert survey involving 15 ML professionals. The results indicate the effectiveness and usefulness of the MLpylint. We aim to extend our current approach by investigating ways to introduce MLpylint seamlessly into development workflows, fostering a more productive and innovative developer environment.
- South America > Brazil > Ceará > Fortaleza (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Sweden > Blekinge County > Karlskrona (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
A Query-Aware Multi-Path Knowledge Graph Fusion Approach for Enhancing Retrieval-Augmented Generation in Large Language Models
Wei, Qikai, Ning, Huansheng, Han, Chunlong, Ding, Jianguo
Retrieval Augmented Generation (RAG) has gradually emerged as a promising paradigm for enhancing the accuracy and factual consistency of content generated by large language models (LLMs). However, existing RAG studies primarily focus on retrieving isolated segments using similarity-based matching methods, while overlooking the intrinsic connections between them. This limitation hampers performance in RAG tasks. To address this, we propose QMKGF, a Query-Aware Multi-Path Knowledge Graph Fusion Approach for Enhancing Retrieval Augmented Generation. First, we design prompt templates and employ general-purpose LLMs to extract entities and relations, thereby generating a knowledge graph (KG) efficiently. Based on the constructed KG, we introduce a multi-path subgraph construction strategy that incorporates one-hop relations, multi-hop relations, and importance-based relations, aiming to improve the semantic relevance between the retrieved documents and the user query. Subsequently, we designed a query-aware attention reward model that scores subgraph triples based on their semantic relevance to the query. Then, we select the highest score subgraph and enrich subgraph with additional triples from other subgraphs that are highly semantically relevant to the query. Finally, the entities, relations, and triples within the updated subgraph are utilised to expand the original query, thereby enhancing its semantic representation and improving the quality of LLMs' generation. We evaluate QMKGF on the SQuAD, IIRC, Culture, HotpotQA, and MuSiQue datasets. On the HotpotQA dataset, our method achieves a ROUGE-1 score of 64.98\%, surpassing the BGE-Rerank approach by 9.72 percentage points (from 55.26\% to 64.98\%). Experimental results demonstrate the effectiveness and superiority of the QMKGF approach.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States (0.04)
- Europe > Sweden > Blekinge County > Karlskrona (0.04)
Toolsuite for Implementing Multiagent Systems Based on Communication Protocols
Chopra, Amit K., Christie, Samuel H. V, Singh, Munindar P.
Interaction-Oriented Programming (IOP) is an approach to building a multiagent system by modeling the interactions between its roles via a flexible interaction protocol and implementing agents to realize the interactions of the roles they play in the protocol. In recent years, we have developed an extensive suite of software that enables multiagent system developers to apply IOP. These include tools for efficiently verifying protocols for properties such as liveness and safety and middleware that simplifies the implementation of agents. This paper presents some of that software suite.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > North Carolina (0.04)
- (24 more...)
AGI Enabled Solutions For IoX Layers Bottlenecks In Cyber-Physical-Social-Thinking Space
Khelloufi, Amar, Ning, Huansheng, Dhelim, Sahraoui, Ding, Jianguo
The integration of the Internet of Everything (IoX) and Artificial General Intelligence (AGI) has given rise to a transformative paradigm aimed at addressing critical bottlenecks across sensing, network, and application layers in Cyber-Physical-Social Thinking (CPST) ecosystems. In this survey, we provide a systematic and comprehensive review of AGI-enhanced IoX research, focusing on three key components: sensing-layer data management, network-layer protocol optimization, and application-layer decision-making frameworks. Specifically, this survey explores how AGI can mitigate IoX bottlenecks challenges by leveraging adaptive sensor fusion, edge preprocessing, and selective attention mechanisms at the sensing layer, while resolving network-layer issues such as protocol heterogeneity and dynamic spectrum management, neuro-symbolic reasoning, active inference, and causal reasoning, Furthermore, the survey examines AGI-enabled frameworks for managing identity and relationship explosion. Key findings suggest that AGI-driven strategies, such as adaptive sensor fusion, edge preprocessing, and semantic modeling, offer novel solutions to sensing-layer data overload, network-layer protocol heterogeneity, and application-layer identity explosion. The survey underscores the importance of cross-layer integration, quantum-enabled communication, and ethical governance frameworks for future AGI-enabled IoX systems. Finally, the survey identifies unresolved challenges, such as computational requirements, scalability, and real-world validation, calling for further research to fully realize AGI's potential in addressing IoX bottlenecks. we believe AGI-enhanced IoX is emerging as a critical research field at the intersection of interconnected systems and advanced AI.
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Africa > Middle East > Algeria > Djelfa Province > Djelfa (0.04)
- (7 more...)
- Research Report > Promising Solution (1.00)
- Overview (1.00)
- Transportation (1.00)
- Telecommunications (1.00)
- Information Technology > Security & Privacy (1.00)
- (4 more...)
HarmonE: A Self-Adaptive Approach to Architecting Sustainable MLOps
Bhatt, Hiya, Biswas, Shaunak, Rakhunathan, Srinivasan, Vaidhyanathan, Karthik
Machine Learning Enabled Systems (MLS) are becoming integral to real-world applications, but ensuring their sustainable performance over time remains a significant challenge. These systems operate in dynamic environments and face runtime uncertainties like data drift and model degradation, which affect the sustainability of MLS across multiple dimensions: technical, economical, environmental, and social. While Machine Learning Operations (MLOps) addresses the technical dimension by streamlining the ML model lifecycle, it overlooks other dimensions. Furthermore, some traditional practices, such as frequent retraining, incur substantial energy and computational overhead, thus amplifying sustainability concerns. To address them, we introduce HarmonE, an architectural approach that enables self-adaptive capabilities in MLOps pipelines using the MAPE-K loop. HarmonE allows system architects to define explicit sustainability goals and adaptation thresholds at design time, and performs runtime monitoring of key metrics, such as prediction accuracy, energy consumption, and data distribution shifts, to trigger appropriate adaptation strategies. We validate our approach using a Digital Twin (DT) of an Intelligent Transportation System (ITS), focusing on traffic flow prediction as our primary use case. The DT employs time series ML models to simulate real-time traffic and assess various flow scenarios. Our results show that HarmonE adapts effectively to evolving conditions while maintaining accuracy and meeting sustainability goals.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > India > Telangana > Hyderabad (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Sweden > Blekinge County > Karlskrona (0.04)
- Energy (0.71)
- Transportation (0.67)
- Consumer Products & Services > Travel (0.35)
A Data Synthesis Method Driven by Large Language Models for Proactive Mining of Implicit User Intentions in Tourism
Wang, Jinqiang, Ning, Huansheng, Zhu, Tao, Ding, Jianguo
In the tourism domain, Large Language Models (LLMs) often struggle to mine implicit user intentions from tourists' ambiguous inquiries and lack the capacity to proactively guide users toward clarifying their needs. A critical bottleneck is the scarcity of high-quality training datasets that facilitate proactive questioning and implicit intention mining. While recent advances leverage LLM-driven data synthesis to generate such datasets and transfer specialized knowledge to downstream models, existing approaches suffer from several shortcomings: (1) lack of adaptation to the tourism domain, (2) skewed distributions of detail levels in initial inquiries, (3) contextual redundancy in the implicit intention mining module, and (4) lack of explicit thinking about tourists' emotions and intention values. Therefore, we propose SynPT (A Data Synthesis Method Driven by LLMs for Proactive Mining of Implicit User Intentions in the Tourism), which constructs an LLM-driven user agent and assistant agent to simulate dialogues based on seed data collected from Chinese tourism websites. This approach addresses the aforementioned limitations and generates SynPT-Dialog, a training dataset containing explicit reasoning. The dataset is utilized to fine-tune a general LLM, enabling it to proactively mine implicit user intentions. Experimental evaluations, conducted from both human and LLM perspectives, demonstrate the superiority of SynPT compared to existing methods. Furthermore, we analyze key hyperparameters and present case studies to illustrate the practical applicability of our method, including discussions on its adaptability to English-language scenarios. All code and data are publicly available.
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
- (8 more...)
- Consumer Products & Services > Travel (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)